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Abstract 

We consider the problem of robustly recovering a fc-sparse coefficient vector from the Fourier 
series that it generates, restricted to the interval [—0,0]. The difficulty of this problem is linked 
to the superresolution factor SRF, equal to the ratio of the Rayleigh length (inverse of O) by 
the spacing of the grid supporting the sparse vector. In the presence of additive deterministic 
noise of norm a, we show upper and lower bounds on the minimax error rate that both scale 
like ( SRF) 2k ~ 1 a , providing a partial answer to a question posed by Donoho in 1992. The 
scaling arises from comparing the noise level to a restricted isometry constant at sparsity 2k, 
or equivalently from comparing 2k to the so-called er-spark of the Fourier system. The proof 
involves new bounds on the singular values of restricted Fourier matrices, obtained in part from 
old techniques in complex analysis. 
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1 Introduction 

In this paper we consider approximations in the partial Fourier system 

Jjru 


CLj( oj) = 




oj G [—O, O], 


where r is the grid spacing and U is the band limit. We recover Fourier series when 0 = -, but for 
smaller values of U the collection aj(oo) is non-orthogonal and redundant. 

We are interested in the problem of recovering the coefficients xoy that enter fc-sparse expansions 
of the form 

/ M = X x o ,jaj(u) + e{oj), \T\ = k, (1) 

ier 

from the sole knowledge of f(u>) with oj € [—D, fi], and where e(oj) is a perturbation of size ||e ||2 < o. 
The notation |T| refers to the cardinality of T. The difficulty of this problem is governed by the 
superresolution factor 

which measures the number of grid points covered by the Rayleigh length This paper is concerned 
with the precise balance between SRF, the sparsity k, and the noise level u, for which recovery of 
the index set T and the coefficients xqj is possible. 
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It is well-known that the sparse recovery problem dTJ) is one of the simplest mathematical models 
that embodies the difficulty of superresolution in diffraction-limited imaging, direction finding, and 
bandlimited signal processing. An important alternative would be to let tj receive any positive 
value in place of jr, but we do not deal with the “off-grid” case in this paper. 

Without loss of generality, and for the remainder of the paper, we consider the renormalized 
problem 

e ijS 

= 0 G [-?ry,Try], 

where 9 = toj and y = ^ = We now recover Fourier series when y = 1. In the sequel we 

assume y < 1/2. 

1.1 Minimax recovery theory 

Write / = Ax o + e as a shorthand for an expansion in the dictionary Aqj = aj(9) with coefficients 
x'oy, plus some noise e. The theory that we now present applies to general matrice^] A, not 
necessarily to partial Fourier matrices. For an index set T, denote by At the restriction of A to 
columns in T. Assume that the columns are unit-normed. 

The best achievable error bound on any approximation of xq from the knowledge of / is linked 
to the concept of lower restricted isometry constant. This notion is well-known from compressed 
sensing, but is used here in the very different regime of arbitrarily ill-conditioned submatrices At- 

Definition 1. (Lower restricted isometry constant) Let k > 0 be an integer. Then 

e k = min fJ min (A T ). 

T:|T|=fc 

Note that e k = y/1 — S k in the notation of mm- 

Denote by x any estimator of xq based on the knowledge of / = Ax o + e. The minimax error 
of any such estimator, in the situation when ||xo||o = |supp xo\ = k and ||e|| < a, is 

E(k, a) = inf sup sup \\x — xo||. 

x xo:\\xo\\o=k e:||e||=cr 

The minimax error is tightly linked to the value of the lower restricted isometry constant at sparsity 
level 2k. We prove the following result in Section [2] 

Theorem 1. Let k > 0 be an integer, and let a > 0. We have the bounds 

11 z. . 1 

-— c7 < E(k,a ) < 2— a. 

2 £2k £2k 

An estimator x is said to be minimax if its error sup a , 0 .|| a . 0 || 0=fc sup e: || e || =(T \\x — xo|| obeys the 
same scaling as E, up to a multiplicative constant. 

The relevance of £ 2 k is clear: it is the error magnification factor of any minimax estimator of 
xq. Estimation of a general fc-sparse coefficient sequence is possible if and only if a is small in 
comparison to £ 2 k - 

1 Albeit with a continuous row index. Because the column index is finite, this feature is inconsequential and does 
not warrant the usual complications of functional analysis. 
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1.2 The lower restricted isometry constant 

The analysis that we present in this paper reveals that £ n is controlled by the superresolution factor 
via the quantity c(y) = sin(^) = sin( 2 ^ F ). 

Theorem 2. There exist C > 0 and y* > 0 such that, for all 0 < y < y*, and with c(y) = s'm(ny/2), 

C(^y<e n+1 <*c { yr. 

We conjecture that the restriction to small y is not needed for the statement to hold. The proof 
is based on two distinct results that we present in Section [3j 

• Lemma U} which establishes that, when y is small, the worst-case scenario for the least singular 
value is when j = 0,1,... , k — 1 (or any k consecutive integers); and 

• Lemma [U which provides upper and lower bounds for the least singular value in this scenario. 
This paper’s main result is obtained by combining theorem Q] with theorem [2] when n + 1 = 2k. 

Corollary 3. 

C^SRFf^a < E{k,a) < C 2 ,k(SRF) 2k ~ 1 c j. 

The proof is clear from the fact that c(y) x (SRF ) , and from absorbing the unknown behavior 

of £ 2 k for small SRF in the pre-constants. For the same reasons as above, we conjecture that the 
constants C\^ and C 2 .k do not depend on k. 

Note that Corollary [3] is the worst-case bound. There may exist large subsets of vectors xq that 
exhibit further structure than fc-sparsity, and for which the recovery rate is substantially better 
than {SRFf^a. 

1.3 Related work 

Corollary [3] addresses a special case of a question originally raised by Donoho in 1992 in [10]. In 
that paper, Donoho recognizes that the “sparse clumps” signal model is the right notion to achieve 
superresolution. Given a vector x, he lets r for the smallest integer such that the number of nonzero 
elements of x is at most r within any consecutive subset of cardinality r times the Rayleigh length. 
Clearly, the set of vectors that satisfies Donoho’s model at level r includes the r-sparse vectors. If 
E(r,i 7 ) denotes the minimax error of estimating a vector at level r, under deterministic noise of 
level (T in L 2 , then Donoho showed that 

Ci^ r (SRF) 2r ~ 1 a < E(r, a) < C 2yr (SRF) 2r+1 a. 

Corollary [3] is the statement that there is no gap in this sequence of inequalities — and that 
Donoho’s lower bound gives the correct scaling — albeit when r is understood as sparsity rather 
than the more general (and more relevant) “sparse clumps” model. It would be very interesting to 
close the exponent gap in the latter case as well. 

Around the same time, Donoho et al. [12] established that perfect recovery of fc-sparse positive 
vectors was possible from 2k low-frequency noiseless measurements, and that the mere positivity 
requirement is a sufficient condition to obtain unique recovery. It is worth comparing this result 
to very classical work on the trigonometric moment problem m, where k complex measurements 
suffice to determine k real-valued phases and k real-valued positive ampitudes in a model of the 
form (JT]) , sampled uniformly in u>. The observation that m . = 2k is the minimum number of noiseless 
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measurements necessary for recovery of a k- sparse vector is also clear from the more recent literature 
on sparse approximation. 

The significance of 2k as a threshold for recovery of fc-sparse vectors also plays a prominent role 
in Donoho and Elad’s later work HD- They define the spark s of a matrix A to be the smallest 
number of linearly dependent columns, and go on to show that the representation of the form Ax 
is unique for any s/2-sparse vector x. We explain in section [2T3l why our results can be seen as 
a noise-robust version of this observation: the functional inverse of the lower restricted isometry 
constant £*,, i.e., k as a function of e, qualifies as the e-spark s £ of A, and equals twice the sparsity 
level of vectors x that are robustly recoverable from Ax. 

It should be emphasized that our analysis concerns the situation when data are available for 
all u> € [—fi, fi], i.e., in the continuum. The same results hold for finely sampled c a, though it is 
not the purpose of this paper to discuss precisely what sampling condition will lead to the same 
scaling of the minimax error. For superresolution, it appears that the bandwidth parameter plays 
a more central role in the recovery scaling than the number of measurements. 

A resurgence of interest in the superresolution problem was spurred by the work of Candes 
and Fernandez-Granda, who showed that i\ minimizatiorU is able to superresolve spikes that are 
isolated, in the sense that their distance is at least a constant times the Rayleigh length in 

this paper’s language, their stability estimate reads E < ( SRF) 2r a with r = 1. Related important 
work is in m u na m- The same spike separation condition is also sufficient for other types of 
algorithms to perform superresolution, such Fannjiang and Liao’s work on MUSIC [El El], and 
Moitra’s work on the matrix pencil method [22], where the separation constant is completely sharp. 

As we put the final touches to this paper, we also learned of the work of Morgenshtern and 
Candes [23], which shows that the estimate E < ( SRF) 2r a continues to hold in the setting of 
Donoho’s definition of r, for £\ minimization on a grid, without the spike separation condition, and 
as long as xq is entry wise nonnegative. It is well-known that i\ minimization does not generally 
superresolve when xq has opposite signs and A selects low frequencies. 

As mentioned earlier, Theorem [2] is based on upper and lower bounds on the smallest singular 
value of Aq j when j spans a sequence of k consecutive integers (see lemma [2] in section [3]) The 
spectral problem for this matrix was already thoroughly studied in the theory of discrete prolate 
sequences by Slepian in [25], who found the asymptotic rate of decay for the eigenvalues of A*A, 
both in the limit N —>• oo and SRF —>• 0. Lemma El however concerns the non-asymptotic case, 
and could not have been proved with the same technique^! as in [25]. Note in passing that the 
usual operator of time-limiting and band-limiting, giving rise to non-discrete prolate spheroidal 
wave functions [ 26 i tzm ns], is of a very different nature from A. Its column index is continuous, 
and its singular values decay factorially rather than exponentially. 

From a practical point of view, it is clear that Corollary [3] is mostly a negative result. For any 
SRF greater than 1, the conditioning of the problem grows exponentially in the sparsity level k. 

2 Minimax recovery and the e-spark 

2.1 Robust £ 0 recovery 

Consider data / = Ax o + e with ||e|| < a, and the £q recovery problem 

(P 0 ) min ||x|| 0 , \\f - Ax\\ < a. 

X 

2 Or its continuous counterpart, the total variation of a measure, in the gridless case. 

3 The techniques in [25] could have led to a weaker form of Theorem (2j which could have sufficed to arrive at 
Corollary [3] but would have taken us farther from the conjecture that Corollary [3] holds with Ci and C 2 independent 
of k. 
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Any minimizer of (Pq) generates an estimator of xq that we will use to prove the upper bound in 
Theorem IT] We now show the role of the lower restricted isometry constant £ 2 k at level 2k for (q 
recovery of a /c-sparse xq. 

Theorem 4. Let k > 0 be an integer. 

(i) Let a > 0. Let xq € M n with ||a?ollo = k, and let f = Ax 0 + e for some ||e|| < a. Then any 
minimizer x of (Pq) obeys ||x — xo|| < -f^a 

(ii) There exists xo £ M n with ||xo||o = k such that f = Ax 0 is explained by a sparser vector 
rather than xo with tolerance £ 2 k, Le. ? there exists x\ for which ||xi||o < k, ||xi — xo|| = 1 , 
and ||/ - Axi\\ < £ 2 k- 

Proof. Let k > 0. 

(i) Let x be a minimizer of (Pq), so that \\f — Ax || < cr. Since ||/ — Ax o|| < cr as well, it follows 
that ||A(x — xo)|| < 2a. We also have ||x||o < ||a?o II 0 5; k, hence ||x — xo||o < 2 k. By definition 
of the lower restricted isometry constant, this implies ||A(x — xo)\\ > e2fc||^ — ^o||- Comparing 
the lower and upper bounds for || A{x — xo)||, we conclude \\x — xo|| < ^ cr - 

(ii) By definition of the lower restricted isometry constant, we may pick a vector x of sparsity 
||x||o = 2k , unit-normalized as ||x|| = 1, and such that ||Ax|| < £ 2 k- Threshold x to its 
k largest components in absolute value; call the resulting fc-sparse vector x\. Gather the 
remaining k components into the ^-sparse vector — xq. Then x = x\ — xq and ||xi — xo|| = 1. 
Let / = Ax 0 , and observe that ||/ — Ax i|| = ||Ax|| < £ 2 k- 

□ 


It is not known whether any polynomial-time algorithm can reach those bounds in general. 


2.2 Minimax recovery 

In this section we prove Theorem [I] The upper bound follows from choosing any £0 minimizer and 
applying Theorem [U 

For the lower bound, let x(f) be any function of /. Pick x € such that ||x|| = 1, ||Ax|| < £ 2 &, 
and ||x||o = 2 k. As in the argument in the previous section, partition x into two components xo and 
—xi of sparsity k, but normalize them so that x = ^f(xo — x{). Then we have ||A(xo — xi)|| < a. 
Now let / = Axq, and compute 


— = 11x 0 - Xl|| = IIx(/) - x 0 - (x(/) - Xl)II 
£2 k 

< \\x (/) - X 0 || + II x(f) - XlII 

< 2max{||x(/) - x 0 ||, ||x(/) - xi||} 

The data / can be seen as derived from xo, since / = Axo, but also from xi, since / = Axi + e for 
some vector e with ||e|| < a. Hence 

-< max{||x(/) — xo||, || x(f) — xi||} < max sup ||x(Axj + ej ) — Xj\\ 

2 £2 k i =0 d ||e 3 '||<cr 

< sup sup ||x(Ax + e) — x|| 

||x||o=fc I|e||<cr 


The lower bound holds uniformly over the choice of x, which establishes the claim. 
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2.3 Recovery from the e-spark 

We introduce the notion of e-spark of A, as a natural modification of the notion of spark introduced 
in m, and link it to the notion of lower restricted isometry constant. 

Definition 2. (e-spark) Fix e > 0. Then s £ is the largest integer such that, for every T, |T| < s £ , 

£ ^ ^min(AT'). 

When the lower restricted isometry constant e s is strictly decreasing, it is easy to see that 
s £s = s , i.e., s £ is a composition inverse of e s . However, we cannot in general expect better than 
E Se > £• When e = 0, we recover the spark introduced in mi, though our 0-spark is in fact Donoho 
and Elad’s spark minus onc@. 

In other words, the definition of e-spark parallels that of spark, but replaces the notion of rank 
deficiency by that of being e-close, in spectral norm. 

Theorems [1] and [H can be seen as the robust version of the basic recovery result in [llj• The 
following theorem is a literal transcription of Theorem [4] in the language of the e-spark. We 
respectively let |_aj and [a] for a’s largest previous and smallest following integers. 

Theorem 5. Let a > 0. 

(i) Assume that ||o?oHo < L^J f or some <5 > 0. Then any minimizer x of (Po) obeys ||a; — xo|| < 
2 a/5. 

(ii) Assume that a > <r m i n (A). There exists xq such that ||xo||o > |~4y~|, f or which f = Ax o 
can be approximated by a sparser vector than xq, in the sense that there exists x such that 
IMIo < ||®o||o, \\x - Soil = 1, and \\f - Ax || < a. 

In other words, the sharp recovery condition comparing the noise level with the lower restricted 
isometry constant at level 2k, namely 

~ e 2||3;o||o’ 

can be rephrased as the comparison of the sparsity level to half the cr-spark, as 



These two points of view are equivalent. 

3 Consecutive atoms 

In this section we prove Theorem [2j We return to the case Agj = a,j(6) = . 

Any upper bound on a m \-n {Ar) provides an upper bound on £ n +i when \T\ = n + 1. However, 
in order to get a lower bound on e n+ j, we need to control a min fAr) for every T of cardinality n + 1. 
The following lemma establishes that T = {0,1,... ,n} gives rise to the lowest <r m i n (Ar), at least 
in the limit y —>• 0. The proof is postponed to Section [4j 

Lemma 1. There exists y* > 0 such that, for all 0 < y < y* , the minimum of U min fAr) over 
T : \T\ = n + 1 is attained when T = {0,1,..., n}. 

4 That seems to be the price to pay to get s Ss = s. 
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It therefore suffices to find lower and upper bounds on the least singular value of Agj, as a 
semi-continuous matrix with row coordinate 9 E [—vrj/. iry] and column index 0 < j < n. The result 
that we prove in this section is as follows. 

Lemma 2. Let T = {0,1,... ,n} and c(y) = sm(iry/2). There exists C > 0 such that 

Cm n <*^AT)<*c { yr. 

The singular values of At are the square roots of the eigenvalues of the section 0 < ji,j% < n 
of the Gram matrix 

/ iry 

a h {6)a j2 (6) d6. 

-■jry 

A detour through complex analysis will provide tools that will help understand the eigenvalues of 
G. 


3.1 Preliminaries on complex analysis and Szego’s theory 


In the sequel we rely on the characterization of G as a Toeplitz form for the Lebesgue measure on 
a circle arc in the complex plane. Notice that a,j(d) = 


VTTy 


z 3 with z = e ie . Let T be the circle arc 


T = {z : \z\ = 1 , —iry < arg z < 7 ry}. 


Its length is L = 2ny. Consider the arclength inner product 

if,g) = ^Jf(z)g(z)\dz\, ( 2 ) 

and the corresponding norm ||/|| = \J (/, /). On the unit circle, \dz\ = jzdz. With this inner 
product, we can understand G as the Gram matrix of the monomials: 

G h ,h = (z jl ,z j2 ). 

The orthogonal (Szego) polynomials {p n (z)} on T play an important role. They are defined 
from applying the Gram-Schmidt orthogonalization on the monomials {^ n }, resulting in 

{PmiPri) — d rnn . 

Denote by k n the coefficient of the highest power of Pn(z), i.e., p n (z) = k n z n + .... Observe that 
the Pn(z') are extremal in the following sense. 

Lemma 3. (Christoffel variational principle) Let M n C P n be the set of degree-n monied polyno¬ 
mials over T. Then 

min || 7 r|| = — (3) 

7reM„ k n 

The unique minimizer is k^p^z). 

Proof. Let tt(z) = Ylm=o ^mPm(z) with A n k n = 1. By orthonormality, 

n 

INI 2 = 5Z X m- 

m =0 

Under the constraint A„. = 1 /k n , this quantity is minimized when Ao = • • • = A n _i. In that case 
the minimizer is A n Pn{z) = kf 1 p n (z) and the minimum is A^ = kf 2 . □ 

5 With coefficient of the leading power equal to one. 
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In order to quantify k n , we need to better understand the asymptotic properties of p n (z) at 
infinity. Consider the analytic function z = 4 > ( w ) which maps |rc| > 1 conformally onto to the 
exterior of T. such that w = oo is preserved, and such that the orientation at oo is preserved. It 
has the explicit expression 

,, s cw + l 

</>(w)=w --—, (4) 


with 


Indeed, it can be seen that 


c = sin ■ 


w + c 
ny 


(5) 


4>(e l9 ) = exp ( 2 i arg I e w + 


i6 


with an argument that covers [—7r y, iry] twice. This expression for <f is not new, see for example 

m- 

The number c in Q is the so-called capacity of T. 

Definition 3. The capacity (or transfinite diameter) of T is the coefficient of w in the Laurent 
expansion of cj>(w) at infinity. 


In our case, the Laurent expansion at oo is 

4>{w) = cw + (1 — c 2 ) + 


c(l - c 2 ) 


w + c 

= CW + (1 - c 2 ) + ^2 7 nW~ n , 

n> 0 

for some coefficients 'y ni hence the capacity of T is indeed c = sin 

A major finding in Szego’s theory [23113 is the asymptotic match p n (z) ~ Sn( z ) at z = oo, 
where 

/ r \ V 2 

g«(z) = ( (1>'(2)) 1/2 ('l>( 2 ))”, (6) 

and where w = 4>(z) the composition inverse of ([4]). In our case, we compute 


, , N z — 1 

m = — + 


a - 1 ) : 

4c 2 


+ z 


1/2 


z c — 1 \ ^ 

= —I-h / S n 

c c z —' 

n> 0 


(7) 


The extremities of T are branch points for the square root, and the branch cut should be on T itself 
for 4>(^) to be analytic outside T. 

Recall that L = 2ny. Matching asymptotics at infinity yields 


Pn(z) ~ k n z n , Vy (^(^)) 1/2 (^(-)) n ~ y/yc~ n ~h n , 


hence we anticipate that 

k n ~ \Zyc~ n -^ ( 8 ) 

as n -A oo. We formulate a non-asymptotic version of this result in the next subsection. 

An important proof technique in the sequel is the Szego kernel K(z,zq). The Hardy space 
H 2 (Q), where H extends to oo and has boundary T, is the space of analytic functions, bounded at 
infinity, and with square-integrable trace on 17 
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Definition 4. The Szego kernel K(z,( ) relative to the exterior Ll of a Jordan curve or arc T 
is the reproducing kernel for H 2 {fJ), i.e., the unique function K(z,-) € H 2 (Ll) such that, for all 
F € H 2 (n), 

F(z) = jJ F ( C)W^)\dCl zen. (9) 

We would have liked to have found the following result in the literature. 

Proposition 6 . Let T be the image of the unit circle |w| = 1 under the conformal map z = <f(w), 
and assume that V is a Jordan arc. Assume that / is one-to-one and invertible for z outside T, 
and let w = $(z). Then the Szego kernel obeys 


K((,z) 


L 

TT 


( V ( 0$'(*)) 1/2 


<KCM~) 

$(C)$(z)-i‘ 


Proof. The transformation law for the Szego kernel under a conformal map w 


K(z',z) = (<$>\z')) 1/2 K 0 (<S>{z')Mz)) (^(^)) V2 , 


( 10 ) 

$(z) is HI] 

( 11 ) 


where I\q is for the pre-image To (a Jordan curve) of T. The formula assumes that K and Kq are 
reproducing for the arclength inner products without prefactors, both in w and z. In our setting, 
the desired K is however normalized for ([9]), which involves a 1/L prefactor, and a single rather 
than double traversal of the arc T. Our desired K is therefore 2 L times the right-hand-side in dill) . 

In our case To is the unit circle. It suffices therefore to show that the Szego kernel for the 
exterior of the unit circle is 


K 0 (w',w) 


1 w'w 
2 -7T w'w — 1 


( 12 ) 


Recall Cauchy’s integral formula for bounded analytic functions in the exterior of the unit circle: 


1 / ./W) , = f f(oo) - f(w) if M > 1 ; 
2m J w' — w \ /(oo) if |u)| < 1. 


Note that dw 1 = iw'\dw'\ and w' = 1/w' on the unit circle. Evaluate the Cauchy formula at w = 0 
in order to obtain /(oo), then simplify the formula for the case |tc| > 1 as 


/M 


1 

27T 


1 

27T 


1 - 


1 


1 — ww 


= ) f(w')\dw'\ 


ww' 


ww' — 1 


f(w')\dw'\ 


This expression is of the form 


/M 


f(w')K 0 (w',w)\dw'\, 


with A'o given by (fT2l) . To complete the proof, we must observe that Kq(w,w') is analytic and 
bounded in w' , hence a member of w' in H 2 (Q) as required by definiton [J] (This point is important: 
the Cauchy kernel doubles as Szego kernel only for the unit circle.) 

□ 

The limits as f and/or z —>• oo exist and are finite since K is an element of H 2 {fJ). We also 
note that the kernel I\ is extremal in the following sense. 
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Lemma 4. (Widom \29j ) Consider 


V = ij f f r \ F ( z )\ 2 \ dz l 


where the infimum is over F E H 2 (Q) such that F(zq) = 1 for some zq E C (the extended complex 
plane including z = oo). The infimum is a minimum, the extremal function is unique, and obeys 


F(z) 


K(z,zp) 

K{zq,ZqY 


3.2 Non-asymptotic bounds on the coefficient k n 
Theorem 7. With c = sin we have 

—c 2n < k~ 2 < 4(1 + 2 y) 2 c 2n . 

2 y 

Proof. The proof of the lower bound is essentially an argument due to Widom [29] that we reproduce 
for convenience. Let i t(z) = A;” 1 p n (z). From the characterization of 4>(z) in fljj), and since n is 
monic, we get 

lim (c&(z))~ n ir(z) = 1. 

z—»oo 

Consider now the quantity 

j = J^\(c<S>(z))- n TT(z)\ 2 \dz\. 

Since |4>(;s)| = 1 on T, and using lemma[3l we obtain 

J = c 2n J \' K ( Z )\ 2 \d z \ = c~ 2n Lk~ 2 . 

On the other hand, we can write the lower bound 


J > inf f \F(z)\ 2 \dz\ = //, 

F Jr 

where the infimum is over all F in the Hardy space of analytic functions in the exterior of T, square 
integrable over T; and such that F{ oo) = 1. We can invoke Lemma [Hand Proposition [6] to obtain 
the unique extremal function 


f( 2 ) = -£^ = («i-'(+ /2 . 

A (oo, oo] 

We can compute the value of p by hand: 

p= fW{z)\ 2 \dz\ = - 2 ^\F(z)\ 2 \dz\ 

= C -j^\z)\\dz\ 

= - ® \dw\ = C7T. 

2 J\ w \=i 
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The factor 1/2 in the first line owes to the fact that, in order to change variables from z to w, the 
curve T is traversed twice as w traverses the unit circle. We can now combine the various bounds 
to obtain 


7 —2 \ ^ 2 n 

kn - T C = 2j C 


2n 


The proof of the upper bound is somewhat trickier and does not follow the standard asymptotic 
arguments of Szego Emm and Widom [29] . We use LemmaO and invoke the classical fact that the 
so-called Faber polynomial <& n {z) is an adequate substitute for p n (z), with comparable oscillation 
and size properties. In this context, we define < h n (^) as the polynomial part of the Laurent expansion 
at infinity of the function (d>(z)) n . From ([7]) , we observe that 


<h n (z) = c n z n + lower-order terms. 

The monic version of <3? n (z), for use in place of the minimizer in Q, is f n (z ) = c n & n (z). We 
now make use of a relatively recent inequality due to Ellacott [I4] . (which in turn owes much to a 
characterization of <f> n (,s) due to Pommerenke El), 

v 

max | | < —, 

ze r 7r 

where V is the so-called total rotation of T, defined as the total change in angle as one traverses 
the curve, with positive increments regardless of whether the rotation occurs clockwise or counter¬ 
clockwise. In other words, if 6(z) is the angle that the tangent to T at z makes with the horizontal, 
then V is the total variation of 9{z), or 


V= / | d0(z) 


In the case of a circle arc of opening angle 2ny, it is easy to see that V = 27r(l + 2 y). 
We conclude with the sequence of bounds 


K 2 < ll/n || 2 

= c 2n ^ J^ n (z)\ 2 \dz\ 


< C 


‘In 


v 2 


7T 


< 4(1 + 2 yfc 2n . 


Remark 3.3. The exact asymptotic rate for k n 2 as n —>• oo can be inferred from the work of 
Widom H2F same fashion as above; it is 


k~ 2 ~ -c 2n 


c 

—< 
y 


However, favorable inequalities for small n are not readily available from those arguments. The 
reason for the factor 2 discrepancy between the lower bound in Theorem [?] and the asymptotic rate 
can be traced to the fact that T is a Jordan arc (with empty interior), not a Jordan curve. It is for 
the same reason that the asymptotic expression 0) differs from that given by Szego in [2 7| / ; p. 372, 
by a factor 1/V2. 

□ 
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3.4 Upper bound on the smallest singular value 

Let 

A n = span{aj(0); 0 < j < n}, 

and P n be the orthoprojector on A n . Subspace angles allow to formulate upper bounds on eigen¬ 
values of G. Specifically, recall that ||a n || = 1, and consider 

sinZ(a n , A„_i) — d(a n , A n —i) — || o, n Pn— lAnll- 

The norms are in L 2 (—ny, Try). 

Lemma 5. Consider a matrix [A 6] with columns normalized to unit norm. Then its smallest 
singular value obeys 

S min ([A b ]) < sinZ( 6 , Ran A). 

Proof. Denote by Pa the orthoprojector onto Ran A. In the matrix spectral norm || • || 2 , 

S min ([A 6 ]) < ||[A b] — [A P 46 HI 2 (SVD gives the best rank in — 1) approximation) 

= \\b-P A b\\ 

= sinZ( 6 , Ran A) (because || 6 || = 1) 


□ 


The change of variables z = e l8 reveals that 


a n - P n -ia n \\ = ||z n - P n -iz n \\ = d(z n ,P n - 1 ), 


where P n is overloaded to mean the orthoprojector onto span {1, z,..., z 11 }; where the first norm 
is in L 2 (— 7 ry, 7 ry); and where the second norm is given by equation (| 2 |). 

It is then well-known that d{z n ,P n - 1 ) is accessible from the coefficient k n introduced earlier in 
Section I3A1 


Lemma 6. 


d(z n ,P n - 1 ) = 


1 

V ’ 


where p n (z) = k n z n + lower-order terms is the orthogonal polynomial introduced in Section HO 
Proof. The Gram-Schmidt orthogonalization procedure yields 


Pn(z) = 


z n - P n .iZ r 

| Z n - P n _ lZ r 


which takes the form 

Pn(z') — k n z T Qn—X^z), Qn— 1 € Pn— lj 

with l/k n = || z n — P n -iz n \\ = d(z n ,P n -i). □ 

We can now combine Lemmas [5] and [ 6 ] with Theorem [7l and y < 1/2, to conclude that the least 
singular value of aj(9), with 9 € [— iry, ny] and 0 < j < n, is upper-bounded by 

k~ l <Ac n , c = sm(ny/2). (13) 
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3.5 Lower bound on the smallest singular value 

Recall that cr m i n{Ax) = yj A m ; n (G), where G is the Gram matrix 


G hh = (z n ,z 22 ) = 


u 


z 2l z 22 \dz\ 


0 < 31,32 < n. 


We make use of the following characterization of the eigenvalues of G, which according to Berg 
and Szwarc [2], was first discovered by Aitken [8]. It was also used in the work of Szego tm and 
that of Widom and Wilf [30] . 


Lemma 7. (Aitken) 


1 1 
--—— = max — 

•^min(G) P 2TT 


r2i r 


\P(e ie )\ 2 d8, 


where the maximum is over all degree-n polynomials P(z ) such that ||P|| = 1. 

Proof. The variational characterization of A m i n (G) gives 

, ^ . c*Gc . ||P || 2 

Amin (hr) = mill — = mill — , 

c c*C P C*C 

where c = (co,..., c n ) T , and the last min is over P of the form P(z) = Ylk=o c k zk ■ For such a P, 
we can apply orthogonality of z n on the unit circle to obtain 


1 


f27T 


- / \P(e lV )fd6 = c*c. 


□ 


A useful bound for the growth of any such P(z) away from T can be obtained from the fact 
that the Szego kernel reproduces bounded analytic functions outside of T. The following argument 
was used in [29] . 

Lemma 8. Let P(z ) be a polynomial of degree n such that ||P|| 2 = j- f r |i- > (^r)| 2 |cZa;| = 1. Then 

\P(z)\ < K(z,z)^ 2 mz)\ n . 

Proof. Let F(z) = &(z)~ n P(z). This analytic function obeys ||P|| 2 = 1, and is bounded at z = oo, 
hence belongs to the Hardy space H 2 (Q). By Definition [I] 

F(z) = ^^KPZ)F(C)\dC\. 

By Cauchy-Schwarz, we get 

\F(z)\ < ^ J^\K(z,C)\ 2 \dC\^ ’ ||P||. 

The Szego kernel is itself in H 2 (Q) as a function of its left argument, hence 

K(z,z') = jjK(C,z’)K(&z)\dz\. 
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By letting z = z\ we get f f r z)\ 2 \d(\ = K(z,z ), hence 

|-F(z)| < K(z, z) 1 / 2 , 


and 

ip«i = ij’wi i*wr < z ) 1 ' 1 i#wr. 


An application of Proposition [6] yields an upper bound for P(z) as 


ip(,)i 2 < 7i«'wi l J*S| a >l ! 1 i*wr. 


z g 


□ 


(14) 


It is a good match, up to a factor y/2, with the absolute value of the asymptotic approximation 
(|6j) for p n (z) at z = oo. (This y/2 factor can again be traced to the fact that T is a Jordan arc 
traversed once, not a Jordan curve. It is unclear to us that it can be removed in the context of 
non-asymptotic bounds.) 

However, near T where |4>(z)| = 1, the bound is very loose. To formulate a better bound, 
consider the banana-shaped region bounded by ^ = {z : |3>(z)| = 2}. 

Lemma 9. Let P{z) be a polynomial of degree n such that ||P|| 2 = j; fr l-^’(' 2 0l 2 l c M = 1- F° r z 
in the interior 0 /T 2 , 

|PMI 2 < — (15) 

7T CV 1 — Cr 

Proof. Since P(z) is analytic, we can apply the maximum modulus principle inside Tg. In order to 
use the bound (fl4l) on T 2 , we need an upper bound on |$'(z)|. By passing to the w = <J>(z) variable 
via 0 and <f>'{Q(z)) = \/$>'(z). it is elementary but tedious to show that 


When |$(*)| 


#(*) 1 < 


1 (!$(*)!+ c) 2 

cVT^c 1 l$0)| 2 -i ‘ 


2, the bounds combine to give (fT5l) . 


□ 


We are now left with the task of bounding Jj , =1 |P( 2 ;)| 2 \dz\ from equations (flTl) and (fl5l) . Call 
R\ the region defined by \z\ = 1 and |3>(z)| > 2, while R 2 corresponds to \z\ = 1 and |3>(z)| < 2 


• For R\ , it is advantageous to pass to the w = $(z) variable via 0. The pre-image of the arc 
of |^| = 1 limited by |$(z)| > 2 is the arc of the circle C of equation 


1, 2 1 - C 2 


\w + -\ = 

c 


limited by |io| > 2. Using (USD, the two Jacobians cancel out and we get 


>Ri 


\P(z)\ 2 \dz\ < 


L 


\w\ 


7T J C \w\’ 2 — 1 


w\ Zn \dw\. 


Parametrize C using w = — \ ^ 1 ^ e lS with 9 € [— tt, tt), so that the maximum of the 

integrand occurs when 9 = 0. The measure becomes \dw\ = v ' 1 ~ c2 d9. and since |rc| > 2, we 
have 


[ \P(z)\ 2 \dz\ < — c - 2 ^- 1 (1 - c 2 ) 1/2 r (2- c 2 + 2VI-C 2 cos 9) n d9. 

Jr ! 3-7T J_ n 
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f 

The integrand is handled using the bound a + b cos 6 < (a + b)e 2( t + 1) (valid for 6 € [— ir, n) 
as long as c < .85). Let a = ^ + 1, so that 

[ \P{z)\ 2 \dz\ < — c- 2 "- 1 (2(2 - c 2 )) n (l - c 2 ) 1/2 r e~ n & d6. 

Jr i 37r J-k 

4 L /4 —2c 2 \ n+ ^ 

“ 3\/2vrn V c 2 / 

To get the last line, we have used the fact that (1 — c 2 ) l / 2 a < (2 — c 2 ) 1 / 2 . 

• The contribution along i ?2 is of a different order of magnitude. The endpoints z± of the 
corresponding arc of the unit circle can be obtained from |3>(z)| = 2 and \z\ = 1, which reveals 
that z± = i j>{w±) with w± = 2e ± * Q and cos a = —5c/4. Further elementary calculations (using 
acos(x) < \yj\ — x for x > 0) show that the arc length between z + and is bounded by 
2\/2vrc. Hence (fl5l) implies that 

/ \P(z)\ 2 \dz\<8V2L^J==2 2n . 

Jr 2 v 1 - c z 


The upper bound on J|,| =1 1^ > (-2 ; )| 2 |^| is then the sum of the contributions along i?i and i? 2 - 
The former contribution always dominates the latter (up to multiplicative constants, either in the 
limit c —>• 0 or n —» oo), because our assumption that y < \ implies c < \/2/2, and in turn 


( 2 \ n 

4 ~ c 2 j > 6 n > 4 n . A bit of grooming results in the bound 


A"L(G) < C 


4 — 2c 2 


2 \ n 


where C > 0 is a reasonable numerical constant. An even shorter statement is A m ; n (G') > C (|)“ n , 
For the least singular value of a,k(x), we get the lower bound 


C 


c = sin(7ry/2). 


(16) 


4 Non-consecutive atoms 

We now prove Lemma [lj 

Let T for a set of n +1 integers that we denote Tj (the fact that they are integers has no bearing 
on the forthcoming argument.) Let 

e iTj6 

{ Ar )e,j = ^=, 0 G [-Try,Try], tj € T. 

The Gram matrix Gj lt j 2 = f^ y ( z / ^t)o,j 1 (^T)o.j 2 i s invariant under translation of the Tj, hence so 

are its eigenvalues. We may therefore view G°s eigenvalues as functions of the differences Tj 1 j 2 = 
* Recall that ^niin ( G ) = a min ( A t)- 

Definition 5. We say that a function °f some arguments Tj 1 j 2 , for 0 < j\ < j 2 < n, is 

increasing if 

/({rj lj2 }) > /({r J1J2 }), 
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provided t' }x - 2 > Tj 1: j 2 for all 0 < j\ < j 2 < n. Furthermore, f is strictly increasing if 

/({ T iij 2 }) > f({ T h,j2}), 

provided at least one of the inequalities rj 1J - 2 > Tj 1 j 2 is strict. 

Theorem 8. (Monotonicity of A min (GO in T.) Fix n + 1, the cardinality of T. There exists 
y* > 0, such that for all 0 < y < y*, the eigenvalue A min (G) is an increasing function of the phase 
differences Tj 2 — Tj 1 . 

The theorem shows that, as long as Tj are integers, the minimum eigenvalue of G is minimized 
when T is any set of n + 1 consecutive integers. We conjecture that the result still holds when the 
restriction on y is lifted. 

Proof. It suffices to shows that A m i n (G) is strictly increasing in the phase differences in the limit 
y —>• 0, since the claim will also be also true for sufficiently small y by continuity of A min (G) as a 
function of y. 

Without loss, consider that 6 E [0, 27T y\ instead of [— Try,Try\. This transformation does not 
change the eigenvalues of G. Expand the complex exponential in Taylor series to get 


v*Gv= J2 

31 , 32=0 


r*27r y 


V 3 1 V 32 


2ny 


e if> ( T n _t J2 ) d6 


^ _ i mi (-z)™ 2 1 

/ y Qmi Qm 2 " 


r2wy 


mi,m2>0 


'm i! m 2 ! 2iry J 0 


9 mi+m2 d0 , 


where q m is the m-th moment of v with respect to the Tj, i.e., 

n 

q m = Y, V 3 T 3 l - 

3 =0 


One way to invert this relationship for Vj is to write 


<7 = 


Q0:n 

Qn+l\oo 


with the square Vandermonde matrix 


M 

N 


v, 


M = 


(1 .. 

• 1\ 

Oo •• 

• T n 

U •• 

. T n) 
1 n / 


and then let v = M 1 qo :n . The integral factor in the expression of v*Gv is fg ny Q mi+m2 d0 = 
(27 Ty) mi+m2+1 H mi ,m 2 , with H mi)Tn2 = 


mi+m2 + l 


the Hilbert matrix. After further letting D y = 


diag 




y m ), we may express the Rayleigh quotient for G in terms of q as 


v*Gv _ q*D*HD y q 
v*v ~ V q^ n M-*M- 1 q 0;n 
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Notice that the dependence on y is only present in the diagonal factor D y (and the leading scalar 
factor y.) 

Since at least one of the first n + 1 components of q is nonzero, and H is positive definite when 
restricted to 0 < mi, m 2 < n, the minimum of J must be of exact order y 2n+ 1 as y —>• 0. This is for 
instance the case when qo- n = (0,..., 0, 1) T and q n+ \ :co = 0. More generally, values on the order of 
y 2n+ 1 can only be obtained when the weight of qo :n is predominantly on the last component. 

In the limit y —>• 0, we now observe that the contribution of q n + i :0O is negligible in the numerator 
of J, since 

y (o 4+1:0c) d;hd v ( 0 ) = o( y 2n+3 ) « y 2n+1 . 

\Qn+ l:oo J 

Denote by H n and D yn the respective 0 < mi, m 2 < n sections of H and D y . With p a shorthand 
for D y , n qo-.n, the problem has been reduced to proving that the (nonzero) limit as y —)• 0 of 

—’In P*H k p 

mm y -=— 

P^O p*D y ; n M-*M- 1 D y } n p 

is strictly decreasing in the phase differences. We are in presence of the minimum generalized 
eigenvalue of the pencil H n — yB y , where 

B y = y 2n M~ l D~^. 


As y > 0, B y is invertible, hence all the generalized eigenvalues are positive. As y —>• 0 however, 
B y degenerates to the rank-1 matrix 

Bq = c n mm*, 


/2tt \2n 

with c n = { ^ r Jy 2 and where m* is the nth (i.e., last) row of M . In that case, all but one 
of the generalized eigenvalues become + 00 . (Indeed, we can change basis to transform the pencil 
H n — fic n mm* into some H n — ye\ej. whose characteristic polynomial has degree 1.) It is convenient 
to call this generalized eigenvalue it depends in a continuous and differentiable manner on y as 
y —>• 0. In the limit y —>• 0, the gradient of y in the components of m can be obtained by standard 
perturbation analysis as 


V m /i — 


2y 


-m . 


mm 


Interestingly, it does not depend on H n , and only depends on y through y. 

The inverse of the Vandermonde matrix M can be computed with Vieta’s formulas, which yield 
a closed-form expression for m: 


m j = 


(-i>'II 

*7 


1 

Ti - Tj ’ 


0 < j < n. 


(The other elements of M _1 appear to be significantly more complicated.) Each component | rn-j\ 
in absolute value is manifestly strictly decreasing in the phase differences. We wish to reach the 
same conclusion for the eigenvalue y. 

For any two vectors m and m! corresponding to different sets of phases Tj and rj such that 

Tj ~T-> Tj - Ti, j > i, 

at least one of the inequalities being strict, it is clear that |m' | < \mj\, with at least two of the 
inequalities being strict. It is also clear that m and m' can be connected in a continuous way so 
as to respect this monotonicity property, namely there exists a sequence m(t) indexed by some 
parameter t £ [0,1] such that 
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• m(0) = rn and m( 1) = rn'\ 

• m(t ) is piecewise differentiable with bounded derivative m(i); 

• the sign of rhj(t ) matches that of —rnj(t) componentwise; and 

• ihj(t) ^ 0 in at least two components at a time. 

The corresponding values of // = fi(m( 0)) and // = /r(m( 1)) obey 

~ / -^y(rn(t))dt 

A 


= J m(t) T \7 m fj(m(t))dt 
= 2 


'o 


1 . . .. rh(t) T m(t ) , 


m{t) T m{t ) 


By construction rh(t) T m(t) < 0 for all t, hence we reach the desired conclusion that < fi. 


□ 
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